Goto

Collaborating Authors

 sparsity ratio


DiscoveringSparsityAllocationforLayer-wise PruningofLargeLanguageModels

Neural Information Processing Systems

In this paper, we present DSA, the first automated framework for discovering sparsity allocation schemes for layer-wise pruning in Large Language Models (LLMs). LLMs have become increasingly powerful, but their large parameter counts make them computationally expensive. Existing pruning methods for compressing LLMs primarily focus on evaluating redundancies and removing element-wise weights. However, these methods fail to allocate adaptive layerwise sparsities, leading to performance degradation in challenging tasks.








Appendix RepresentationLearningProcess

Neural Information Processing Systems

Here we provide more experimental results. Specifically, we evaluate the representational similarity using the CKA value of the same layer from the model (ResNet-32) with different sparsity at each epoch and compare them with the final model. In our work, we evaluate four different types of freezing schemes (Sec. Inthis case, we can keep the single-shot & resume has the same FLOPs reduction asthe single-shot scheme, and the entire network can be fine-tuned at the end of training with a small learningrate. For the periodically freezing scheme, we let the selected layers freeze periodically with a given frequency so that all the layers/blocks are able to be updated at different stages of the training process.